Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support C++20 Modules #19940

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

PikachuHyA
Copy link
Contributor

this PR implement the support C++20 Modules in bazel.

the design doc: bazelbuild/proposals#354

the discussion: #19939

the demo: https://github.com/PikachuHyA/async_simple

the extra tests: https://github.com/PikachuHyA/bazel_cxx20_module_test

see #4005

@PikachuHyA PikachuHyA requested review from gregestren and removed request for a team October 25, 2023 11:08
@google-cla
Copy link

google-cla bot commented Oct 25, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-CPP Issues for C++ rules labels Oct 25, 2023
@lberki lberki requested review from comius and removed request for oquenchil, ahumesky, ted-xie and gregestren October 25, 2023 13:32
@comius
Copy link
Contributor

comius commented Oct 25, 2023

This PR already got a lot of attention at Google in the group of C++ toolchain maintainers / experts. There’s a desire to have it, but no concrete/incompatible plans yet. The design would need some changes so that it’s compatible and supports Google well. (Think of easier maintenance in the future)

I’m not an expert in C++, but I will start the discussion internally and come back with possible requirements/changes when we figure out what they are.

@comius comius self-assigned this Oct 27, 2023
@comius
Copy link
Contributor

comius commented Oct 27, 2023

Some people are out of office. The main discussion will start second week of November. I’ll post next update after that.

@sgowroji sgowroji added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 8, 2023
@PikachuHyA PikachuHyA force-pushed the cxx20-modules-support branch 2 times, most recently from 826867b to cf2c9ad Compare November 16, 2023 08:32
@PikachuHyA
Copy link
Contributor Author

I rebase the PR to the latest master branch due to MODULE.bazel.lock conflict

@PikachuHyA
Copy link
Contributor Author

Some people are out of office. The main discussion will start second week of November. I’ll post next update after that.

gentle ping :-)

@mathstuf
Copy link

CMake developer here; just tracking how modules are being implemented in various places :) .

I read through the design doc and had a few comments. Since it was already merged, I figured that here may be better; can move wherever is best though.

  • Two-phase compilation is only supported by Clang. With the work ongoing to make smaller BMIs, a .pcm → .o rule may not be so feasible in the future. There may be a way to do .full.pcm → .importable.pcm / .full.pcm → .o` though? In any case, this is something the build system can hide away from the user interface pretty easily.
  • I notice that references are not tracked in .CXXModules.json files. This was found to be necessary in CMake for MSVC where BMIs contain no transitive references to the BMIs they need. GCC still embeds them; Clang is deprecating it. This helps the reproducible case but means that the build system needs to track transitive imports to specify in the .modmap files when using modules. As an example, if the module import looks like leaf → intermediate → impl → detail, the P1689 is only going to report one level at each scan (i.e., intermediate's .ddi file won't specify detail unless directly imported). The .CXXModules.json must somehow store "I see an import of intermediate; impl and detail need specified as well".

@ChuanqiXu9
Copy link

ChuanqiXu9 commented Nov 18, 2023

a .pcm → .o rule may not be so feasible in the future.

No, clang don't have such plans (deprecating 2 phase compilation model) at least for now.

Two-phase compilation is only supported by Clang.

Yes but the story of the 2 phase compilation model seems really appealing. So the build system supporting 2-phase compilation model may be a positive advantages. And in the future, the build systems may be able to support both (or even more) compilation models and the users can make the choice.

@mathstuf
Copy link

No, clang don't have such plans (deprecating 2 phase compilation model) at least for now.

To be more precise, there may be multiple kinds of BMIs in the future and Clang may have a 3-phase with the trimmed BMI being the "interesting" bit for importers in the future, but still using the full BMI for codegen. Clang is also getting a (proper rather than "frontend does the 2-phase internally" of today) 1-phase compilation like GCC and MSVC as well.

Yes but the story of the 2 phase compilation model seems really appealing.

I agree. However, I prioritized 1-phase over 2-phase for CMake due to compiler support.

And in the future, the build systems may be able to support both (or even more) compilation models and the users can make the choice.

Agreed. However, given the simplicity of the 1-phase, I find it better for the initial implementation. There are a number of performance things that can be looked at in the future:

  • only-if-changed on more minimal BMI files
  • target-wide batch scanning
  • grouped target-wide batch scanning
  • grouped target-wide batch collation

Basically my main interest is in getting things working across the ecosystem as a baseline before we start up our ricer cars. Of course, Bazel can do as they please; I can only offer my view on things here.

@mathstuf
Copy link

This issue was filed against CMake. Unconditional redirection of clang-scan-deps may be unwise in the case of a failed scan. Maybe Bazel doesn't care given its execution strategies, but it is something to consider at least.

Copy link

@mathstuf mathstuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider a way to tell Bazel that some sources do not use modules and can therefore completely skip scanning (and, if nothing in the target needs scanned, the target's collation step as well).

Comment on lines +360 to +361
// if cpp20_module enabled, only c++20-deps-scanning will produce .d file
// other actions will reuse the .d file from c++20-deps-scanning

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate as the "real" compile may mention other files in its .d output. Including, but not limited to:

  • the BMI files that are read (or only those that are used)
  • modmap files
  • header units which are translated into imports may stop reading the header and read the BMI directly

The last one should be covered by the header changing -> trigger a rescan, not listing it here allows the build graph to not-run the compile in case its change is non-consequential to the compile by waiting for the scanning to say so rather than queuing up the compile automatically.

Preconditions.checkState(module.isFileType(CppFileTypes.CPP_MODULE), "Non-module? %s", module);
var skyValue = actionExecutionValues.get(module.getGeneratingActionKey());
if (skyValue == null) {
return null;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a problematic error case; no messages or context about what happened?

public CppCompileActionBuilder setPcmFiles(NestedSet<Artifact.DerivedArtifact> pcmFiles) {
this.pcmFiles = pcmFiles;
return this;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems weird here. Also looks like a missing newline after this brace.

Comment on lines +96 to +98
<li>Clang use cppm </li>
<li>GCC can use any source file extension </li>
<li>MSVC use ixx </li>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three can use any extension with the right flags (e.g., -x c++-module or -interface/-interfacePartition). These are the preferred extensions.

var scanDepsBuilder = initializeCompileAction(sourceArtifact);
scanDepsBuilder.setActionName(CppActionNames.CPP20_DEPS_SCANNING);
scanDepsBuilder.setOutputs(ddiFile, dotdFile, null);
// only c++20-deps-scanning add .d file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewher, this seems unwise.

content.append("module-file=");
content.append(moduleName);
content.append("=");
content.append(modulePath);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any escaping mechanism required to be considered (e.g., spaces in the path)?

Comment on lines +48 to +49
@SerializedName("source-path")
private String sourcePath;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future header unit support would require reading use-source-path (bool) and lookup-method (enum). It might be prudent to read these and fail gracefully with a message about header unit non-support.

@comius
Copy link
Contributor

comius commented May 22, 2024

TL;DR The Bazel team has decided to accept this PR, I'll be doing the reviews and I'll get some help from internal C++ experts, namely @trybka.

We identified the following risks:

  • increase in maintenance cost for the Bazel team
  • divergent implementations in Bazel and at Google, or no implementation at Google
  • newly introduced complexity in CppCompileAction

We'd like to keep the maintenance costs at minimum - Bazel team will only do reviews on PRs after the initial community review. We won't address any issues that are reported. We don't mind if the community addresses them.

We'd like to keep the change behind an experimental flag, to mitigate the risk of divergent implementations. While the change is under the experimental flag, there is no guarantee about incompatible changes. If Google does an internal implementation, we'd like it to match, to reduce maintenance costs.

We'd also like to make the change as "modular" as possible, in order to make it easier to remove the future. That might happen in an unlikely scenario, that Google doesn't implement support for the C++20 modules and that this remains the only complexity in CppCompileAction that we can't be rewritten to Starlark. In case this scenario plays out, the C++20 modules support will probably need to be implemented in a different way.

That said, we do see the benefits of this change for both the community and Google. Thank you for your contribution.

@PikachuHyA
Copy link
Contributor Author

For the sake of improving the quality of the review, do you think you could break this XXL PR into several digestible pieces? I'll take care that each piece is reviewed in a couple of business days.

hi @comius , I have split this XXL PR into 6 smaller commits. Initially, I hoped to divide it into independent small patches (see #22425 , #22427), but that proved to be unfeasible due to dependencies between the patches (#22429). Later, I plan to use stacked PRs to facilitate code review. However, stacked PRs require creating branches in the target repository first, and I'm not sure if I could be granted the necessary permissions. I've also created a demo of stacked PRs in my repository (https://github.com/PikachuHyA/bazel/pulls) as bakup.

Do you have any suggestions on code review process?

BTW. the windows CI is broken, I will fix it later.

@PikachuHyA
Copy link
Contributor Author

@mathstuf Thanks for your comments.

I will make the related code changes as soon as possible.

@mathstuf
Copy link

Later, I plan to use stacked PRs to facilitate code review. However, stacked PRs require creating branches in the target repository first,

Nothing should require that; tools doing so should…work on that. It's kind of crazy to make tools not available for external contributors to projects. I believe https://stacked-git.github.io/ does most of its work locally so that at least you're not tied to any Github limitations.

copybara-service bot pushed a commit that referenced this pull request Jun 14, 2024
I split the XXL PR #19940 into several small patches.
This is the first patch of Support C++20 Modules, I add `module_interfaces` attr only

example

-  foo.cppm
```
// foo.cppm
export module foo;
// ...
```
- BUILD.bazel

```
cc_library(
    name="foo",
    copts=["-std=c++20"],
    module_interfaces=["foo.cppm"],
    # features=["cpp20_module"]
)

```

build failed with the following message

```
➜  bazel build :foo
ERROR: bazel_demo/BUILD.bazel:1:11: in cc_library rule //:foo:
Traceback (most recent call last):
        File "/virtual_builtins_bzl/common/cc/cc_library.bzl", line 40, column 42, in _cc_library_impl
        File "/virtual_builtins_bzl/common/cc/semantics.bzl", line 123, column 13, in _check_can_module_interfaces
Error in fail: attribute module_interfaces: requires --experimental_cpp20_modules
ERROR: bazel_demo/BUILD.bazel:1:11: Analysis of target '//:foo' failed
ERROR: Analysis of target '//:foo' failed; build aborted
INFO: Elapsed time: 0.106s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
```
To build with C++20 Modules, the flag `--experimental_cpp20_modules` must be added.

```
➜  bazel build :foo --experimental_cpp20_modules
ERROR: bazel_demo/BUILD.bazel:1:11: in cc_library rule //:foo:
Traceback (most recent call last):
        File "/virtual_builtins_bzl/common/cc/cc_library.bzl", line 41, column 34, in _cc_library_impl
        File "/virtual_builtins_bzl/common/cc/cc_helper.bzl", line 1225, column 13, in _check_cpp20_modules
Error in fail: to use C++20 Modules, the feature cpp20_modules must be enabled
ERROR: bazel_demo/BUILD.bazel:1:11: Analysis of target '//:foo' failed
ERROR: Analysis of target '//:foo' failed; build aborted
INFO: Elapsed time: 0.091s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
ERROR: Build did NOT complete successfully
```

To build with C++20 Modules, the feature `cpp20_modules` must be enabled.

```
bazel build :foo --experimental_cpp20_modules --features cpp20_modules
```

the flag `--experimental_cpp20_modules` works on global and
the feature `cpp20_modules` work on each target

but in this patch, do nothing with C++20 Module Interfaces.

Closes #22425.

PiperOrigin-RevId: 643303029
Change-Id: I08d8a1186d2ddd1c632f1e768442e504b87a0691
copybara-service bot pushed a commit that referenced this pull request Jun 14, 2024
This patch adds `compiler_input_flags_feature` and `compiler_output_flags_feature` to the features.

follow #22717

By default, the features `compiler_input_flags_feature` and `compiler_output_flags_feature` are included through `CppActionConfigs.java` in the `getFeaturesToAppearLastInFeaturesList` method.

For reference, see the relevant code here:

https://github.com/bazelbuild/bazel/blob/0dbfaccaf5bee5ea7f11c01db1fc0cd1ca7f3810/src/main/java/com/google/devtools/build/lib/rules/cpp/CppActionConfigs.java#L1513-L1573

## Background

I modified `tools/cpp/unix_cc_toolchain_config.bzl` and found no input and output on macOS when testing #19940 with the new action names `c++20-deps-scanning` and `c++20-module-compile`.

As discussed in #22429 (comment), I added these two features to `unix_cc_toolchain_config.bzl`.

the Windows toolchains already have these features, so no modifications were necessary for `windows_cc_toolchain_config.bzl`.

- Windows input flags:

https://github.com/bazelbuild/bazel/blob/786a893ef6f69a8f77ca008a478bf67abfdcdc57/tools/cpp/windows_cc_toolchain_config.bzl#L1073-L1095

- Windows output flags:

https://github.com/bazelbuild/bazel/blob/786a893ef6f69a8f77ca008a478bf67abfdcdc57/tools/cpp/windows_cc_toolchain_config.bzl#L960-L1020

cc @comius

Closes #22743.

PiperOrigin-RevId: 643345702
Change-Id: I5715d25e12c7a3616d1fdb484f77ef7cd0fd1bba
copybara-service bot pushed a commit that referenced this pull request Jun 14, 2024
This patch add `dependency_file_feature` to features when OS is macos.

the feature `dependency_file_feature` added by default through `CppActionConfigs.java getLegacyFeatures`

https://github.com/bazelbuild/bazel/blob/0dbfaccaf5bee5ea7f11c01db1fc0cd1ca7f3810/src/main/java/com/google/devtools/build/lib/rules/cpp/CppActionConfigs.java#L93-L117

## Background

I modified `tools/cpp/unix_cc_toolchain_config.bzl` and found `dependency_file` not work on MacOS when testing #19940 with new action name `c++20-deps-scanning` and `c++20-module-compile`. After adding `dependency_file_feature` to features, it works.

cc @comius

Closes #22717.

PiperOrigin-RevId: 643345857
Change-Id: I50210592edd1082e2328c7e4ab68bd0c76087aaa
@PikachuHyA
Copy link
Contributor Author

hi @peakschris , (#22425 (comment))

thanks very much for your interest to this PR.

what is the status of this? I mocked up modules in a non-bazel environment and it looks like it could substantially improve our build times, so I'm eagerly anticipating this :-)

I have completed the one-phase compilation support for GCC, Clang, and MSVC, as well as the two-phase compilation support for Clang.
I'm currently breaking this large PR into 5 smaller patches.

The first patch has already been merged.

The remaining patches need some time for review.

I am looking forward to using C++20 modules in our bazel builds :-)

Thanks. Your feedback and anticipation are really valuable to us.

@jwhpryor
Copy link

I am hugely excited to use this for my own project. Congrats on your hard work and perseverance.

@peakschris
Copy link

Thanks @PikachuHyA, I asked the question here and then moved it to the closed PR that I found. It looks like this is an absolutely mammoth task, excellent effort!

copybara-service bot pushed a commit that referenced this pull request Aug 28, 2024
I split the XXL PR #19940 into several small patches.
This is the second patch of Support C++20 Modules, I add C++20 related tools

## Overview

This patch contains two tools: `aggregate-ddi` and `gen-modmap`. These tools are designed to facilitate the processing of C++20 modules information and direct dependent information (DDI). They can aggregate module information, process dependencies, and generate module maps for use in C++20 modular projects.

## The format of DDI

The format of DDI content is [p1689](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1689r5.html).
for example,

```
{
  "revision": 0,
  "rules": [
    {
      "primary-output": "path/to/a.pcm",
      "provides": [
        {
          "is-interface": true,
          "logical-name": "a",
          "source-path": "path/to/a.cppm"
        }
      ],
      "requires": [
        {
          "logical-name": "b"
        }
      ]
    }
  ],
  "version": 1
}
```

## Tools

### `aggregate-ddi`

#### Description

`aggregate-ddi` is a tool that aggregates C++20 module information from multiple sources and processes DDI files to generate a consolidated output containing module paths and their dependencies.

#### Usage

```sh
aggregate-ddi -m <cpp20modules-info-file1> -m <cpp20modules-info-file2> ... -d <ddi-file1> <path/to/pcm1> -d <ddi-file2> <path/to/pcm2> ... -o <output-file>
```

#### Command Line Arguments

- `-m <cpp20modules-info-file>`: Path to a JSON file containing C++20 module information.
- `-d <ddi-file> <pcm-path>`: Path to a DDI file and its associated PCM path.
- `-o <output-file>`: Path to the output file where the aggregated information will be stored.

#### Example

```sh
aggregate-ddi -m module-info1.json -m module-info2.json -d ddi1.json /path/to/pcm1 -d ddi2.json /path/to/pcm2 -o output.json
```

### `generate-modmap`

#### Description

`generate-modmap` is a tool that generates a module map from a DDI file and C++20 modules information file. It creates two output files: one for the module map and one for the input module paths.

#### Usage

```sh
generate-modmap <ddi-file> <cpp20modules-info-file> <output-file> <compiler>
```

#### Command Line Arguments

- `<ddi-file>`: Path to the DDI file containing module dependencies.
- `<cpp20modules-info-file>`: Path to the JSON file containing C++20 modules information.
- `<output-file>`: Path to the output file where the module map will be stored.
- `<compiler>`: Compiler type the modmap to use. Only `clang`, `gcc`, `msvc-cl` supported.

#### Example

```sh
generate-modmap ddi.json cpp20modules-info.json modmap clang
```

This command will generate two files:
- `modmap`: containing the module map.
- `modmap.input`: containing the module paths.

Closes #22427.

PiperOrigin-RevId: 668488153
Change-Id: Icde51b498f1ecc5c1182427029d0a81ce7c2f686
copybara-service bot pushed a commit that referenced this pull request Aug 30, 2024
## Summary
I have splited the XXL PR [#19940](#19940) into several smaller patches. This is the third patch to support C++20 Modules, which adds the `deps-scanner` tool and updates toolchains.

This patch includes:
1. New action names
2. File extensions
3. Build variables
4. Updated toolchains for compiling C++20 Modules

## Action Names
Three action names have been added:
- `c++-module-deps-scanning`
- `c++20-module-compile`
- `c++20-module-codegen`

When two-phase compilation is employed:
- `c++-module-deps-scanning`: Scans source files and retrieves C++20 Modules dependencies, storing them in `<filename>.ddi`.
- `c++20-module-compile`: Compiles the C++20 Modules Interfaces to a Built Module Interface (BMI), converting `<filename>.cppm` to `<filename>.pcm`.
- `c++20-module-codegen`: Compiles the BMI to an object file, converting `<filename>.pcm` to `<filename>.o`.

When one-phase compilation is employed:
- `c++-module-deps-scanning`: Operates similarly to two-phase compilation.
- `c++20-module-compile`: Compiles the C++20 Modules Interfaces directly to an object file `<filename>.o` and produces a BMI `<filename>.pcm` as a byproduct.

## File Extensions
We follow the file extensions preferred by different compilers, adding two new `ArtifactCategory`s: `CPP_MODULE_GCM` and `CPP_MODULE_IFC`.

- Clang uses `.pcm` (CPP_MODULE, already exists).
- GCC uses `.gcm` (CPP_MODULE_GCM, new).
- MSVC uses `.ifc` (CPP_MODULE_IFC, new).

Following the CMake implementation, we added three extra `ArtifactCategory`s: `CPP_MODULES_INFO`, `CPP_MODULES_DDI`, and `CPP_MODULES_MODMAP`.

- The `.ddi` file (CPP_MODULES_DDI) stores the dependencies information of one source file.
- The `.CXXModules.json` file (CPP_MODULES_INFO) stores dependencies information for an entire target.
- The `.modmap` file (CPP_MODULES_MODMAP) maps module names to BMIs, with different formats for each compiler.

Additionally, a special `ArtifactCategory`, `CPP_MODULES_MODMAP_INPUT`, is an auxiliary file used to easily obtain the requested BMI paths.

## Build Variables
Two build variables, `CPP_MODULE_MODMAP_FILE` and `CPP_MODULE_OUTPUT_FILE`, have been added.

- `CPP_MODULE_MODMAP_FILE` specifies the path to the `.modmap` file and is used by the `cpp20_modmap_file_feature`.
- `CPP_MODULE_OUTPUT_FILE` specifies the output name of the BMI when one-phase compilation is employed and is used by the `cpp20_module_compile_flags_feature`.

## Toolchains
Three action configs (`cpp_module_scan_deps`, `cpp20_module_compile`, and `cpp20_module_codegen`) have been added, corresponding to the action names section.

Two features (`cpp_module_modmap_file_feature` and `cpp20_module_compile_flags_feature`) have been added, corresponding to the build variables section.

Using C++20 Modules necessitates topological ordering for the compilation units. For more details, see the [Discovering Dependencies](https://clang.llvm.org/docs/StandardCPlusPlusModules.html#discovering-dependencies) section.

Considering the various compilers, I have added the `deps-scanner` tool. The default implementation is a script wrapper that uses different scanning methods depending on the compiler. The wrapper `deps_scanner_wrapper` is generated by a template file `<compiler>_deps_scanner_wrapper.sh.tpl`. Three template files have been added:

- `clang_deps_scanner_wrapper.sh.tpl`
- `gcc_deps_scanner_wrapper.sh.tpl`
- `mvsc_deps_scanner_wrapper.bat.tpl`

For a demonstration of how to scan C++20 dependencies, please refer to this [demo](https://github.com/PikachuHyA/cpp20_modules_scan_dependency_demo).

Closes #22429.

PiperOrigin-RevId: 669241384
Change-Id: Id9ee2f66cb075446d0c38e6a6c70786ad9b28022
@peakschris
Copy link

peakschris commented Sep 1, 2024

Hi PikachuHyA, Congrats on getting the first 3 PRs merged, and thanks to reviewers :-) It appears that we need to wait for PR4 to be rebased and merged before we can experiment with this, is that correct?

@PikachuHyA
Copy link
Contributor Author

It appears that we need to wait for PR4 to be rebased and merged before we can experiment with this, is that correct?

Yes. If the #22553 is merged, Bazel will basically support C++20 Modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-review PR is awaiting review from an assigned reviewer awaiting-user-response Awaiting a response from the author team-Configurability platforms, toolchains, cquery, select(), config transitions team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants